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Abstract 

Xyy ■ The famous Lovasz Local Lemma [EL75] is a powerful tool to non-constructively prove the existence of 

Q I combinatorial objects meeting a prescribed collection of criteria. Kratochvil et al. applied this technique 

• ■ to prove that a fc-CNF in which each variable appears at most 2'^ /{ek) times is always satisfiable [KST93]. 

Q I In a breakthrough paper, Beck found that if we lower the occurrences to 0(2'''/^^/fc), then a deterministic 

'— ' ■ polynomial-time algorithm can find a satisfying assignment to such an instance [Bec91] . Alon randomized 

-^ \ the algorithm and required 0(2'^'^ /k) occurrences [Alo91]. In [Mos06], we exhibited a refinement of his 

^ ■ method which copes with 0{2^/^/k) of them. The hitherto best known randomized algorithm is due to 

C^ \ Srinivasan and is capable of solving 0(2*''/'*/fc) occurrence instances [Sri08]. Answering two questions 

CN . asked by Srinivasan, we shall now present an approach that tolerates 0(2'^/^/fc) occurrences per variable 

^^ I and which can most easily be derandomized. The new algorithm bases on an alternative type of witness 

. ■ tree structure and drops a number of limiting aspects common to all previous methods. 
!>■ 

o: 

OO , Key Words and Phrases. Lovasz Local Lemma, derandomization, bounded occurrence SAT instances, 
hypergraph colouring. 



'V^ ■ 1 Introduction 



We assume an infinite supply of prepositional variables. A literal L is a variable x or a complemented 
variable x. A finite set D of literals over pairwise distinct variables is called a clause. We say that a 
variable x occurs in D ii x € D or x £ D. A finite set ip of clauses is called a formula or a CNF 
(Conjunctive Normal Form). We say that 99 is a k-CNF, if every clause has size exactly k. We say that the 
variable x occurs j times in a formula if there are exactly j clauses in which x occurs. We write vai{(p) to 
denote the set of all variables occurring in (p. 

A truth assignment is a function a : var((/5) — > {0, 1} which assigns a boolean value to each variable. A 
literal L = x {or L = x) is satisfied by a if a{x) = 1 (or a{x) = 0). A clause is satisfied by a if it contains 
a satisfied literal and a formula is satisfied if all of its clauses are. A formula is satisfiable if there exists a 
satisfying truth assignment to its variables. 

In [KST93], Kratochvil et al. have applied the Lovasz Local Lemma (from [EL75]) to prove that every 
/c-CNF in which every variable occurs no more than 2^ /{ek) times has a satisfying assignment. The Local 
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Lemma currently appears to be the only workable tool for the obtention of such a bound. Unfortunately, all 
known proofs based on the Local Lemma are non-constructive and do not directly allow for the construction 
of an efficient algorithm to actually find a satisfying assignment. 

Such an algorithm was first provided by Beck in [Bec91] and then randomized by Alon [Alo91]. Their 
algorithm bases on the principle of selecting a preliminary assignment and then discriminating clauses 
according to whether few or many of their literals are satisfied by it. The bad clauses containing few 
satisfied literals are then reassessed and their variables are reassigned new values. Such a procedure is 
efficient if the dependencies are low enough so as to guarantee that clustered components of bad clauses 
are very small and can be solved in a brute force fashion. The original approach by Beck required that no 
variable appear a number higher than G(2^''^^/k) of times. In Alon's simplification, the requirement was 

still o{2''/yk). 

Several authors have improved and extended those approaches, with goals somewhat complementary 
to what we are concerned with. In [MR98] e.g., Molloy and Reed extrapolate guidelines which allow for the 
construction of an algorithmic version to numerous applications of the Local Lemma other than bounded 
occurrence SAT or hypergraph 2-colouring (being almost identical problems). Czumaj and Scheideler give 
alternative approaches that allow for non-uniform clause- or edge-sizes [CSOO]. We do not investigate these 
variations more closely as our current goal is to further improve on the occurrence bound. 

The hitherto most powerful approach has been recently published by Srinivasan [Sri08]. It allows 
to find a satisfying assignment to a A;-CNF in which every variable counts no more than 0{2^'^/k) oc- 
currences. Srinivasan achieves the improvement by critically augmenting and refining the 2, 3-tree based 
witness structures used by Alon. The resulting algorithm is inherently randomized. 

Our contribution bases on definitively getting rid of 2, 3-trees and replacing them by a substantially 
denser witness tree structure. Moreover, we will finally drop the distinction of bad, dangerous and safe 
clauses and base the decision of which components to reassign on the witness structures themselves. As a 
crucial new aspect of our method, those structures are being made part of the algorithm itself, rather than 
just appearing in the probabilistic analysis. 

2 A randomized approach 

Theorem 2.1. There exists a randomized algorithm that finds a satisfying assignment to any k-CNF 99 in 
which no variable occurs more than 2 "/(36/c) times in expected time polynomial in \ip\. 

Let n be the number of variables and m the number of clauses of (p. Let any arbitrary ordering be 
imposed on the clauses of ip (we will refer to it as the lexicographic ordering). Construct the multigraph 
G[(p\ in which every clause of (/? is a vertex and any pair of identical or complementary literals in two distinct 
clauses induces an edge. 

We now describe a randomized algorithm that will find a satisfying assignment to (p. Just as in all 
the previous approaches based on the principle due to Beck and Alon, the algorithm will select a (random) 
preliminary assignment and then solve locally bounded residual components around dissatisfied clauses. In 
contrast to the said approaches, our algorithm will select these components much more restrictively and it 
will actively construct a collection of witness trees that allow for a precise probabilistic analysis. 

In the description of Algorithm 1, a (primary) witness tree T is simply a subtree of G^p] with a 
designated root vertex. 

For the performance analysis to follow, let a collection 7 of witness trees be given (just as constructed 
in the algorithm). Denote V{7) := yj^^^ViT) and var('J) := yJD^y(^\dj{D). Let the natural ordering vr 
of V^(T) be defined as follows: the vertices V{T) of each tree T € T appear consecutively in vr and they are 
ordered in a level-by-level fashion just as in a BFS starting at the root vertex, where siblings are ordered 



Algorithm 1 Local Component Solver 



Input: the formula f 

Output: a satisfying assignment a to ip 

STEP I: PRELIMINARY ASSIGNMENT 

1 Select a preliminary assigment a G {0, 1'^^^^^'^l uniformly at random. 

STEP II: CONSTRUCTION OF WITNESS TREES 

Initialize an empty queue Q of clauses and an empty collection of primary witnesses and then: 

2 select the lexicographically first dissatisfied clause in ^p as the root of a new primary witness tree 
T. Enumerate all the neighbours of the clause in the canonical lexicographic ordering and enqueue 
each of them in Q unless it is already a member of some tree 

3 dequeue the next clause D from Q. If 

i. D contains at least fc/2 variables that do not yet occur in any tree and 
ii. all literals in D over variables not yet occuring in any tree are dissatisfied by a, 

then add D to the tree (attaching it in the natural way to the parent by which is was enqueued). 
Enumerate all the neighbours of the clause in the canonical lexicographic ordering and enqueue 
each of them in Q that is not yet member of any tree. If D does not satisfy the requirements, 
simply skip it (note that it might be enqueued again later). 

4 if Q is non-empty, repeat (3), go to the next step once Q is exhausted 

5 if there is any dissatisfied clause left in ip that is neither a member of nor a neighbour to any 
tree yet, jump back to (2), starting construction of a new primary witness using this clause as the 
root (if there are multiple candidates use the lexicographically first one); pass on to (6) once no 
dissatisfied non-neighbour is available anymore 

STEP III: DISSECTION 

Let us hereafter call a variable x covered, if there exists a witness tree T in the collection built such that 
X occurs in a clause included in that tree. We say that a literal is covered if the underlying variable is 
covered. 

6 inspect every clause of the formula that has not yet been added to any tree. Distinguish: if the 
clause has any satisfied literal over a variable not covered by the witness collection built, 
delete it (completely, from the formula). If it doesn't, truncate it to contain only literals over 
covered variables. After all clauses have been processed, only covered variables are left in the 
resulting formula (p' . 

STEP IV: LOCAL EXHAUSTIVE ENUMERATION 

7 inspect the connected components of G[(/?']. If any of them contains at least A;log(4?7i) variables, 
cancel the current run and restart from the beginning, sampling another preliminary assignment 
a. Otherwise, enumerate all assigments for every component exhaustively, stop at a satisfying 
one and locally replace a by that assignment. 



lexicographically. The trees among each other appear according to the lexicographic ordering of their root 
vertices. Note that the natural ordering is exactly the one in which the algorithm adds vertices to the tree 
collection it constructs and yet vr is fully determined by the shape of T; it is not necessary to consider the 
construction history. 

For a clause D € V^(T) and a variable x E var(Z)), we say that x is novel in D w.r.t. T if -D is the first 
clause according to vr in which x occurs. A literal is novel in a clause if the underlying variable is novel in 
that clause. 

Let us now define a composite witness W = {7,Vg) as a collection T of vert ex- disjoint witness trees 
together with a set Vg Q V {G[{p])\V (7) of extra vertices such that the following properties are satisfied 

i. JTl > \Vg\ 

ii. the induced subgraph G[(^][y(T) U Vg] is connected 

iii. every clause D G V^T) contains at least k/2 novel variables. If D is the root vertex of its tree, then 
all of its k variables are novel in D. 

We define the size \W\ of a composite witness W = (T, Vg) to be |^('J')| + \Vg\. 

We say that a composite witness occurs w.r.t. an assignment a, if all the novel literals that appear 
along vr are dissatisfied by a. 

Lemma 2.2. Let Tq be the collection of witness trees that Algorithm 1 constructs, given ip and a. For each 
connected component C left in G[(p'\, there is a composite witness in G[(p\ that occurs w.r.t. a and which 
contains all the variables in C . 

Proof. Let V = var(C) be the set of variables in a given connected component of G[(/9']. Note that by 
construction of Tq and 93', each tree T S To either lies completely in C and var(T) C y or it is completely 
disjoint, var(r) r\V = %. 

Let now T C Tq be the set of witness trees which lie inside C. Consider the subgraph G[c/p][y(T)] 
induced by the vertices of T. If this subgraph is not connected, let us consider all the vertices that are 
not part of any tree and let us greedily add some of them to Vg so as to make G[(/j][y(T) U Vg] connected. 
We add a vertex if and only if it will help to decrease the number of connected components, thus \Vg\ is 
certainly smaller than the initial number of components and therefore smaller than |T|. Hence, (T, V^) 
satisfies properties (i) and (ii) of a composite witness. 

To check that it satisfies (iii), recall that the natural ordering vr is the same ordering as the one in 
which Algorithm 1 adds clauses to T. The algorithm will add a clause exlusively if there are at least k/2 
not-yet-encountered variables contained in it and all of the not-yet-encountered literals are dissatisfied. This 
implies not only (iii), but also that the composite witness built occurs w.r.t. a. Note that the fact that 
the algorithm considers all trees in Tq previously collected and not only the ones in T does not influence 
which literals are to be classified novel since all the trees in 'Jo\T are completely disjoint in variables from 
the ones in T. This concludes the argument. D 



Lemma 2.3. In G{lp\, there exist at most m • (2 ' /2)" composite witnesses of size exactly 



u. 



Proof. We set out by the observation that there is an injection from the set of composite witnesses (T, Vg) 
of size u into the set of triples (T, c^, Ce) where T is an unrooted subtree of G[(p\ containing u clauses, c^ is 
a 3-colouring (or simply a partition into 3 classes) of the vertices V{T) and Cg a 2-colouring of the edges 
E{T). We can build such a tree on top of the vertex set V(7) U V^ by greedily adding edges to the ones 
already present in T until a spanning tree of the connected induced subgraph G[99][y('J) U Vg] is obtained. 



Distinguish the edges newly added from the original ones by colouring them differently. Moreover, use Cy 
to distinguish the vertices that originate from Vg, the root vertices of the trees in T and the remainder of 
the vertices from one another. It is obvious that the original composite witness can be reconstructed from 
the triple. Therefore it is enough to upper bound the number of such triples. 

According to a simple counting exercise by Donald Knuth [Knu69], the infinite labelled and rooted 
d-ary tree has fewer than (ed)" distinct rooted subtrees of size u. Consider now the (2*^'^/36)-ary such tree. 
It has fewer than (2'^'/^/12)" many distinct rooted subtrees with u clauses. Picking any of them and picking 
any vertex of G[ip] as the root, a subtree T of size u is fully determined: we start at the selected root and 
follow the edges that correspond (where the correspondence is such that the first child corresponds to the 
(lexicographically) first neighbour in the graph and so forth) to the ones included in the selected subtree 
(consider that G[ip] has maximum degree smaller than 2'^'^/36). Additionally, we pick a two-colouring of 
the edges and a three-colouring of the vertices for which we have in total fewer than 6" choices. This yields 
the desired upper bound. D 

Lemma 2.4. When sampling a u.a.r., with probability at least 1/2, no connected component with k\og{4m) 
or more variables is left in G[ip']. Therefore, the algorithm needs to jump back no more than 2 times in the 
expected case. 

Proof. By Lemma 2.2, a connected component of A;log(4?T2) or more variables implies the existence of a 
composite witness containing all those variables which occurs w.r.t. a. Such a witness needs to have size 
at least log(4?7T,). Let us denote by Xu the random variable that counts the number of composite witnesses 
of size exactly u which occur w.r.t. a. A composite witness {7,Vg) has \7\ root vertices in which each 
literal is novel and u — \Vg\ — \7\ vertices with at least k/2 novel literals each. This makes a total of 
(u - \Vg\ - ITDI + |'J|A; = (n - \Vg\ + |T|)| > ^ novel literals. For the witness to occur it is required that 
all of them be dissatisfied. Hence, a fixed composite witness of size u occurs with probability less than 
2-«fc/2 Applying Lemma 2.3, this yields, by linearity of expectation, E[Xm] < m ■ 2~". Again by linearity, 
the expected total number X of composite witnesses of size u or larger that occur w.r.t. a is bounded by 



m ^ .. X log{2m) m-log(2m) , . 

E[X]= Yl nxu]<m(-] ■ Yl 



m ^ ^ log{2m) m-log(2m) / , x u -, 

^ .27 <2- 

M=log(4m) u=l 



Thence, in at least half of the cases, no such witness occurs at all. D 

Once there are only logarithmically small components, it is obvious that the algorithm can enumerate 
all assignments to them in polynomial time. The only thing left to prove is that this will in all cases yield 
a satisfying assignment. To this end, the following lemma demonstrates that each clause left in ip' has size 
at least k/2. By [KST93], a A;/2-CNF with every variable occurring at most 2^^/^/(36^) times is always 
satisfiable. Therefore, the algorithm will find a satisfying assignment by exhaustive enumeration. 

Lemma 2.5. Every clause D £ cp' has size \D\ > k/2. 

Proof. Clauses can only be smaller than k if they have been truncated at some point. Assume that Z? is a 
truncated clause. Let Dq D D he the same clause before truncation. Did Dq\D contain any satisfied literal, 
then it would have been deleted instead of truncated, therefore all literals of Dq\D are dissatisfied. If D 
were empty, then Dq would have been made the root of a new tree, therefore Dq is in the neighbourhood of 
one of the trees constructed by the algorithm and it was therefore enqueued into Q at some point (maybe 
multiple times). Consider the point in time when Dq was dequeued from Q for the very last time in 
history. At that point in time, Dq had exactly the same covered and uncovered variables as it has now after 
termination of the algorithm (since any new covering would have newly triggered enqueueing) . And since 
all of them are dissatisfied, Dq having more than k/2 uncovered literals would be a contradiction since then 



it would have been added to the tree at the time of dequeueing. We conclude that D has to have size at 
least k/2. D 

This concludes the proof of Theorem 2.1. D 

3 A deterministic alternative 

In this section we will show that there is nothing inherently random to the algorithm exhibited; it can be 
easily derandomized. The derandomization technique is in complete analogy to what Beck used in [Bec91]. 

Theorem 3.1. There exists a deterministic algorithm that finds a satisfying assignment to any k-CNF (p 
in which no variable occurs more than 2 ' /(SG/c) times in time polynomial in \(p\. 

The following is a well-known fact. 

Lemma 3.2. Let ijj be any CNF and let Xfyip) denote the random variable counting the number of dissatisfied 
clauses under an assignment sampled u.a.r. //E[X(^)] < 1, then ip is satisfiable and a satisfying assignment 
can be found in time polynomial in ijj- 

Proof. Simply pick any variable x G var(V') and have the algorithm calculate both E[X('i/')|^^'^] and 
E[X(V')|^^^]. Since the two values must average to E[X(V')] < 1, at least one of them is smaller than 
1. We assign the corresponding value to x and continue recursively. D 

We shall now demonstrate that it is sufficient to consider witnesses in a certain size range to exclude 
all large witnesses. 

Let W = {7, Vg) and W' = {7', Vg) be composite witnesses in y?. We say that W implies W' if for all 
a for which W occurs, W' occurs as well. 

Lemma 3.3. Let u > k > 2. For every composite witness W in ip with \W\ > u, there exists a witness W' 
in if having u < \W'\ < {k + l)u such that W implies W . 

Proof. Assume the contrary and let, for fixed u and k, W = {7,Vg) be a smallest counterexample to the 
claim, so let VF be a composite witness of size at least u which does not imply any witness W' with a size 
in the range u < \W'\ < (/c + l)n. We observe that W must be of size larger than (k + l)u since otherwise 
W' = W would fulfill the requirements. 

Suppose Vg y^ ^. Pick any arbitrary clause D € Vg and remove it from Vg, building W := {7, Vg\{D}). 
If W' still is a composite witness, we have found a smaller counterexample which is a contradiction. There- 
fore it must hold that H := G[c^][y(T)uyg\{-D}] is no longer connected. By the structure of G[v9] and since 
D contains exactly k variables, H cannot have more than k connected components of which the largest 
component Cl must have size at least {\W\ — l)/k > u. If we now select all trees and all vertices from Vg 
that lie inside Cl, then this constitutes a smaller composite witness of size at least u, which either has the 
desired properties or is a smaller counterexample (it can easily be checked that all properties that define a 
composite witness are preserved; most notably, the number of vertices in V^ n V{Cl) is smaller than the 
number of trees inside Cl since otherwise we would have a superfiuous node, removal of which would not 
disconnect the structure and hence it would have been spare in W already, contradicting minimality). 

Suppose Vg = ^. Let us pick the very last clause in T according to the natural ordering and let us 
remove that clause from its tree (if the tree had size 1, simply delete it). By the very same argument as 
before, the remainder either stays connected or falls apart into at most k connected components, the largest 
of which contains a composite witness of size at least u. And since we have removed the very last clause, 
all novel literals in the preserved clauses remain novel and so that witness occurs as well. D 



Lemma 3.4. Associate with every composite witness W a clause Dip{W) which consists of all novel literals 
that occur alongside the natural ordering. Let ^j denote the set of all composite witnesses of size exactly j 
in (p. Build the formula 

(fc+l)log{2m) 

^ := y D^[^,]. 

j=log{2m) 

Then ip is satisfiable, a satisfying assignment a to it can be found in time polynomial in \ip\ and w.r.t. such 
an a, no composite witness of size log(2m) or larger occurs in ^p. 

Proof. Let Xj denote the random variable counting the number of dissatisfied clauses in the formula -D(^[^j] 
when sampling truth assignments uniformly at random. According to our earlier considerations, we have 
l^jl < m ■ {2'^''^/2y . Every witness in ^j contains at least jk/2 novel literals. Therefore, E[Xj] < 
2-jk/2 . j^ . (2^1'i' j2y = m2~K Let now X denote the number of dissatisfied clauses in the formula 'ip when 
sampling assignments to it uniformly at random. We obtain 

{fc+l)log(2m) fclog{2m)+l 

E[X] = Y^ E[Xj] < m ■ 2-i°g(™) • Yl '^'^ < 1- 

i=log(2m) i=l 

By Lemma 3.2, ^p is satisfiable and a satisfying assignment to it can be found in time polynomial in \tp\, 
which, in turn, is polynomial in \Lp\. 

Now let a be an assignment that satisfies ^p. Suppose there is a composite witness W of size log(2?7i) or 
larger in (f that occurs w.r.t. a. By Lemma 3.3, there exists W which occurs as well and for which we have 
log(2m) < \W'\ < (A: + 1) log(2m). This witness is therefore contained in some ^j that we enumerated and 
since a satisfies ip, it satisfies D^{W'), so there must be at least one novel literal in W' which is satisfied. 
This contradicts occurrence of W'. D 

Proof of Theorem 3.1. As in Lemma 3.4, we first enumerate all composite witnesses W with log(m) < 
\W\ < {k + l)log(m) and build the respective clauses D^{W) from them, producing ip. We solve ip using 
the designated strategy. Now that we are sure that no witness of size log(2?7i) or larger exists, we can 
safely run the (deterministic) remainder of the algorithm described in Section 2 and we will be sure that 
all residual connected components are of logarithmic size. This will solve (p in time polynomial in |(^|. D 

4 Conclusion 

We have demonstrated that 0(2'^''^/A;)-occurrence instances of fc-SAT are easy to solve. 

Still, there remains a big gap between the class of problems for which the Local Lemma predicts 
a solution and the one for which a polynomial-time search algorithm is known to exist. The main open 
question is whether this gap can be completely closed or whether there is something inherently more difficult 
to the search question as compared to the existence question. 

While various aspects of the present solution suggest that some natural threshold has been hit at fc/2 
and another slight variant of the same method will not bring about another considerable improvement, the 
history of the problem demonstrates that conjectures of this sort ought to be treated with due scepticism. 
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